Hadoop and Spark Performance for the Enterprise by Andy Oram

Author:Andy Oram , Date: March 13, 2018 ,Views: 67

Hadoop and Spark Performance for the Enterprise by Andy Oram

Author:Andy Oram
Language: eng
Format: epub, mobi, pdf
Publisher: O'Reilly Media, Inc.
Published: 2016-07-22T04:00:00+00:00

Log resource usage, recording when a change to container limits was required, and display this information for future use by programmers and administrators

Now we can turn to distributed systems, explore why they have variable resources needs, and look at some solutions that improve performance.

Performance Variation in Distributed Processing

Hadoop and Spark jobs are launched, usually through YARN, with fixed resource limits. When organizations use in-house virtualization or a cloud provider, a job is launched inside a VM with specified resources. For instance, Microsoft Azure allows the user to specify the processor speed, the number of cores, the memory, and the available disk size for each job. Amazon Web Services also offers a variety of instance types (e.g., general purpose, compute optimized, memory optimized).

Hadoop uses cgroups, a Linux feature for isolating groups of processes and setting resource limits. cgroups can theoretically change some resources dynamically during a run, but are not used for that purpose by Hadoop or Spark. cgroups’ control over disk and network I/O resources is limited.

But as explained earlier, the resource needs of distributed processing can actually swing widely, just like operating system processes. There are various reasons for these shifts in resource needs.

First, an organization multitasks. In an attempt to reduce costs, it schedules multiple jobs on a physical or virtual system. Under favorable conditions, all jobs can run in a reasonable time and maximize the use of physical resources. But if two jobs spike in resource usage at the same time, one or both can suffer. The host system cannot determine that one has a higher priority and give it more resources.

Second, each type of job has reasons for spiking or, in contrast, drastically reducing its use of resources. HBase, for instance, suffers resource swings for the same reasons as other databases. It might have a period of no queries, followed by a period of many simultaneous queries. A query might transfer just one record or millions of records. It might require a search through huge numbers of records—taking up disk I/O, network I/O, and CPU time—or be able to consult an index to bypass most of these burdens. And HBase can launch background tasks (such as compacting) when other jobs happen to be spiking, as well.

MapReduce jobs are unaffected by outside queries but switch frequently between CPU-intensive and I/O-intensive tasks for their own reasons. At the beginning, a map job opens files from the local disk or via HDFS and does seeks on disk to locate data. It then reads large quantities of data. The strain on I/O is then replaced by a strain on computing to perform the map calculations. During calculations, it performs I/O in bursts by writing intermediate output to disk. It might then send data over the network to the reducers. The same kinds of resource swings occur for reduce tasks and for Spark. Each phase can use seconds or minutes.

Figure 1-1 shows seven of the many statistics tracked by Pepperdata. Although Pepperdata tracks hardware usage for every individual process (container or

Download

Hadoop and Spark Performance for the Enterprise by Andy Oram.epub
Hadoop and Spark Performance for the Enterprise by Andy Oram.mobi
Hadoop and Spark Performance for the Enterprise by Andy Oram.pdf

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

other	Arts & Photography
Biographies & Memoirs	Business & Money
Calendars	Christian Books & Bibles
Comics & Graphic Novels	Computers & Technology
Cookbooks, Food & Wine	Crafts, Hobbies & Home
Education & Teaching	Engineering & Transportation
Health, Fitness & Dieting	Humor & Entertainment
Law	Lesbian, Gay, Bisexual & Transgender Books
Literature & Fiction	Medical Books
Mystery, Thriller & Suspense	Parenting & Relationships
Politics & Social Sciences	Reference
Religion & Spirituality	Romance
Science & Math	Science Fiction & Fantasy
Self-Help	Sports & Outdoors
Teen & Young Adult	Test Preparation
Travel	Children's Books
History